104 ◾ Bioinformatics
of orthologs (OrthoDB) to measure genome assembly and annotation completeness. The
quality of a genome assembly is described by some metrics including C for complete, D
for duplicate, F for fragmented, M for missing, and n for the number of genes used for the
assessment. The genes recovered from the de novo assembly are reported as complete (C)
when their lengths are within two standard deviations of the mean length of genes on the
ortholog database. Multiple copies of complete genes are reported as duplicate (D), which
is an indication of inaccuracy in the assembly of haplotypes. Incomplete or partially recov-
ered genes are reported as fragmented (F) and the unrecovered genes are reported missing
(M). The number of gene used (n) reflects the confidence of the assessment results.
BUSCO uses a number of third-party software packages that must be installed for the
program to run properly. The BUSCO dependencies include Python 3.x, BioPython, pan-
das, tBLASTn 2.2+, Augustus 3.2, Prodigal, Metaeuk, HMMER3.1+, SEPP, and R + ggplot2
for the plotting companion script. Some of these packages are needed in some cases. For
the complete installation instructions, visit the BUSCO website at “https://busco.ezlab.org/
busco_userguide.html”. You can run the following commands on the Linux command
line to install some BUSCO third-party dependencies and BUSCO software on Ubuntu:
sudo apt update && sudo apt upgrade
pip install biopython
pip install pandas
sudo apt-get install ncbi-blast+
sudo apt install augustus augustus-data augustus-doc
sudo apt install prodigal
sudo apt install hmmer
BUSCO software can be cloned and installed by running the following commands:
git clone https://gitlab.com/ezlab/busco.git
cd busco/
python3 setup.py install –user
FIGURE 3.12 Icarus contig browser displaying de novo assemblies aligned to a reference genome.